Search results for "multi-armed bandit"

showing 8 items of 8 documents

Simple learning rules to cope with changing environments

2008

10 pages; International audience; We consider an agent that must choose repeatedly among several actions. Each action has a certain probability of giving the agent an energy reward, and costs may be associated with switching between actions. The agent does not know which action has the highest reward probability, and the probabilities change randomly over time. We study two learning rules that have been widely used to model decision-making processes in animals-one deterministic and the other stochastic. In particular, we examine the influence of the rules' 'learning rate' on the agent's energy gain. We compare the performance of each rule with the best performance attainable when the agent …

0106 biological sciencesError-driven learningExploitComputer scienceEnergy (esotericism)Biomedical EngineeringBiophysicsBioengineeringanimal behavior010603 evolutionary biology01 natural sciencesBiochemistryMulti-armed banditModels Biologicaldecision makingBiomaterials03 medical and health sciences[ INFO.INFO-BI ] Computer Science [cs]/Bioinformatics [q-bio.QM][ SDV.EE.IEO ] Life Sciences [q-bio]/Ecology environment/SymbiosisAnimalsLearningComputer Simulation[ SDV.BIBS ] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]multi-armed banditEcosystem030304 developmental biologySimple (philosophy)0303 health sciences[ SDE.BE ] Environmental Sciences/Biodiversity and Ecologybusiness.industrydynamic environmentslearning rulesdecision-making[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]Unlimited periodRange (mathematics)Action (philosophy)Artificial intelligence[SDE.BE]Environmental Sciences/Biodiversity and Ecology[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]businessBiotechnologyResearch Article[SDV.EE.IEO]Life Sciences [q-bio]/Ecology environment/Symbiosis
researchProduct

Thompson Sampling Guided Stochastic Searching on the Line for Non-stationary Adversarial Learning

2015

This paper reports the first known solution to the N-Door puzzle when the environment is both non-stationary and deceptive (adversarial learning). The Multi-Armed-Bandit (MAB) problem is the iconic representation of the exploration versus exploitation dilemma. In brief, a gambler repeatedly selects and play, one out of N possible slot machines or arms and either receives a reward or a penalty. The objective of the gambler is then to locate the most rewarding arm to play, while in the process maximize his winnings. In this paper we investigate a challenging variant of the MAB problem, namely the non-stationary N-Door puzzle. Here, instead of directly observing the reward, the gambler is only…

Adversarial systemComputer scienceProperty (programming)business.industryProcess (computing)Reinforcement learningArtificial intelligencebusinessRepresentation (mathematics)Bayesian inferenceMulti-armed banditThompson sampling2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)
researchProduct

An AI for dominion based on Monte-Carlo methods

2014

Masteroppgave i Informasjons- og kommunikasjonsteknologi IKT590 Universitetet i Agder 2014 To the best of our knowledge there exists no Arti_cial Intelligence (AI)for Dominion which uses Monte Carlo methods, that is competitive on ahuman level. This thesis presents such an AI, and tests it against someof the top Dominion strategies available. Although in a limited testingenvironment, the results show that our AI is capable of competing withhuman players, while keeping processing time per move at an acceptablelevel for human players. Although the approach for our AI is built onprevious knowledge about Upper Con_dence Bounds (UCB) and UCBapplied to Trees (UCT), an approach for handling the st…

IKT590VDP::Technology: 500::Information and communication technology: 550Dominion ; UCT ; UCB ; AI ; Multi-Armed Bandit Problem ; Monte-Carlo ; Tree Search
researchProduct

Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters

2010

Published version of an article from Lecture Notes in Computer Science. Also available at SpringerLink: http://dx.doi.org/10.1007/978-3-642-13033-5_21 The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Alt…

Scheme (programming language)Mathematical optimizationOptimization problemComputer scienceBayesian probabilityVDP::Technology: 500::Information and communication technology: 550Kalman filterBayesian inferenceMulti-armed banditVDP::Mathematics and natural science: 400::Information and communication science: 420::Knowledge based systems: 425computerThompson samplingOptimal decisioncomputer.programming_language
researchProduct

Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

2015

The multi-armed bandit problem has been studied for decades. In brief, a gambler repeatedly pulls one out of N slot machine arms, randomly receiving a reward or a penalty from each pull. The aim of the gambler is to maximize the expected number of rewards received, when the probabilities of receiving rewards are unknown. Thus, the gambler must, as quickly as possible, identify the arm with the largest probability of producing rewards, compactly capturing the exploration-exploitation dilemma in reinforcement learning. In this paper we introduce a particular challenging variant of the multi-armed bandit problem, inspired by the so-called N-Door Puzzle. In this variant, the gambler is only tol…

Scheme (programming language)business.industryComputer scienceBayesian probabilityBayesian inferenceMulti-armed banditLine (geometry)Reinforcement learningArtificial intelligenceRepresentation (mathematics)businessThompson samplingcomputercomputer.programming_language
researchProduct

Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

2012

Published version of an article in the journal: Applied Intelligence. Also available from the publisher at: http://dx.doi.org/10.1007/s10489-012-0346-z The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, Quality of Service (QoS) control, game playing, and resource allocation, can be solved …

VDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413Bayesian learningVDP::Technology: 500::Information and communication technology: 550::Computer technology: 551Optimization problembusiness.industryComputer scienceGoore GameBayesian inferenceMulti-armed banditquality of service controldecentralized decision makingArtificial IntelligenceInfluence diagramResource allocationArtificial intelligencebandit problemswireless sensor networksbusinessWireless sensor networkOptimal decisionApplied Intelligence
researchProduct

Exact simulation of diffusion first exit times: algorithm acceleration

2020

In order to describe or estimate different quantities related to a specific random variable, it is of prime interest to numerically generate such a variate. In specific situations, the exact generation of random variables might be either momentarily unavailable or too expensive in terms of computation time. It therefore needs to be replaced by an approximation procedure. As was previously the case, the ambitious exact simulation of exit times for diffusion processes was unreachable though it concerns many applications in different fields like mathematical finance, neuroscience or reliability. The usual way to describe exit times was to use discretization schemes, that are of course approxim…

[MATH.MATH-PR] Mathematics [math]/Probability [math.PR]Probability (math.PR)primary 65C05 secondary:60G40 68W20 68T05 65C20 91A60 60J60diffusion processes[MATH] Mathematics [math]Exit timeExit time Brownian motion diffusion processes rejection sampling exact simulation multi-armed bandit randomized algorithm.randomized algorithm[MATH.MATH-PR]Mathematics [math]/Probability [math.PR]exact simulationFOS: MathematicsBrownian motionmulti-armed banditMathematics - ProbabilityRejection sampling
researchProduct

Arm Space Decomposition as a Strategy for Tackling Large Scale Multi-armed Bandit Problems

2013

Recent multi-armed bandit based optimization schemes provide near-optimal balancing of arm exploration against arm exploitation, allowing the optimal arm to be identified with probability arbitrarily close to unity. However, the convergence speed drops dramatically as the number of bandit arms grows large, simply because singling out the optimal arm requires experimentation with all of the available arms. Furthermore, effective exploration and exploitation typically demands computational resources that grow linearly with the number of arms. Although the former problem can be remedied to some degree when prior knowledge about arm correlation is available, the latter problem persists. In this…

symbols.namesakeMathematical optimizationComputer scienceNash equilibriumMulti-agent systemsymbolsSampling (statistics)Game theoryThompson samplingMulti-armed bandit2013 12th International Conference on Machine Learning and Applications
researchProduct